-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Counting VM-allocated pages into heap size. #866
Conversation
Some VMs allocate memory outside the MMTk heap, using `malloc` or other allocation methods. Those memory can usually be reclaiming by the finalizers of dead object in the MMTk heap. This commit allows the VM to report the amount of such off-heap memory so that MMTk can trigger GC more promptly to reclaim such memory.
MSRV-related tests are failing, but we already worked around that issue previously. |
src/vm/collection.rs
Outdated
/// Return the amount of memory (in bytes) the VM allocated (but not released yet) outside the | ||
/// MMTk heap which can be released by doing GC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Return the amount of memory (in bytes) the VM allocated (but not released yet) outside the | |
/// MMTk heap which can be released by doing GC. | |
/// Return the amount of memory (in bytes) the VM allocated outside the | |
/// MMTk heap and would like to include in the MMTk heap size. Usually | |
/// it is memory that is kept alive by in-heap objects. |
The words about 'not released yet' and 'can be released' are confusing. I would rather use 'kept alive by heap objects'. Free feel to change the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. "Kept alive by heap objects" sounds better.
src/vm/collection.rs
Outdated
/// elements using `malloc`, then those buffers should be included in this amount. When the GC | ||
/// finds such an array dead, its finalizer shall `free` the buffer and reduce this amount. | ||
/// | ||
/// This function is a hint for the MMTk core to trigger GC, therefore does not have to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not merely a hint. We use it to decide our actual heap size, and metrics like min heap for benchmarks. I think we can just remove this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the "min heap size" is an empirical value, too. It depends on the plan, the number of CPUs (which determines the number of GC threads) and other specifics of the machine the program is running on. I think it is OK to say it is a hint, given that the exact set of "objects kept alive by heap objects" is not precisely defined. Another important point is that the VM should not spend too much time computing the exact value because this function is called very often (by gc_poll
). An approximate value should be good enough for MMTk to trigger GC promptly and keep the memory footprint under control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. The value does not need to be precise for the sake of performance. But I woudn't say it is a 'hint'. A 'hint' means something like an suggestion. Like Java's System.gc()
is a hint, we may do or may not do a GC. Julia's --heap-size-hint
is a hint, the actual heap size can be larger than the given value. But in our case, the number returned by the method is not a hint, and we do count it into our heap size and we will always count it -- that is the contract of this method, the memory is counted into our heap size.
You have another paragraph talking about performance. If you would like to emphasis further on the tradeoff between performance and being precise, you can expand that section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll remove the "hint" part and merge it with the paragraph about performance.
src/vm/collection.rs
Outdated
/// VM uses `malloc` and the implementation of `malloc` is opaque to the VM), the VM binding | ||
/// can simply return the total number of bytes of those off-heap objects as an approximation. | ||
/// | ||
/// The default implementation which returns 0 should also work if the VM only allocates a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph can be removed as well. You just need some comments in the default impl, to say that by default we assume VM does not allocate off-heap memory, or does not desire to include the off-heap memory in MMTk heap size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's reasonable.
src/vm/collection.rs
Outdated
/// MMTk heap which can be released by doing GC. | ||
/// | ||
/// MMTk core will add this amount to the amount of memory allocated or reserved in MMTk | ||
/// spaces, and will trigger a GC if the sum exceeds a certain limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to trigger a GC depends on the internal GC trigger code. You could simply say that MMTk core will consider the reported memory as part of MMTk heap for the purpose of heap size accounting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is the GC trigger that determines when to trigger GC, but I want the VM binding developer to know that the return value of this function is additive (rather than multiplicative or having other non-linear relation) to MMTk core's "used pages" when considered to trigger GC, so that the developer can have a basic idea of what effect the return value of this function will have on triggering GC. At least the current implementation is additive for both fixed heap size and the MemBalancer. The MemBalancer changes the heap limit dynamically, but the way to compute occupied pages is still adding the used pages of MMTk spaces and the result of this function. I can emphasise that being additive is only a detail of the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the VM binding developer to know that the return value of this function is additive (rather than multiplicative or having other non-linear relation) to MMTk core's "used pages" when considered to trigger GC
I think this is incorrect. GC triggering does not necessarily relate to heap sizes. We could implement a GC trigger that uses allocation volume -- in which case, we do not directly use the vm allocated pages returned by the method, we may use a diff of value or simply ignore it. Sticky immix does not trigger nursery GC based on heap sizes, LXR has similar behaviors as well.
mmtk-core/src/plan/sticky/immix/global.rs
Line 153 in 40c9773
self.immix.immix_space.get_pages_allocated() > self.options().get_max_nursery_pages(); |
This is what I suggest about the contract of this method.
MMTk core will consider the reported memory as part of MMTk heap for the purpose of heap size accounting.
But we do not guarantee that GC triggering is related with heap size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the second paragraph and actually give recommendations for VMs about what to include instead of saying "MMTk doesn't specify what to include except that number is considered by the GC trigger". In this way, newer GC triggers can decide whether to take that reported number into consideration when triggering GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/vm/collection.rs
Outdated
/// elements using `malloc`, then those buffers should be included in this amount. When the GC | ||
/// finds such an array dead, its finalizer shall `free` the buffer and reduce this amount. | ||
/// | ||
/// If possible, the VM should account off-heap memory by pages. That is, count the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// If possible, the VM should account off-heap memory by pages. That is, count the number of | |
/// If possible, the VM should account off-heap memory in pages. That is, count the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe "in terms of pages used" is more appropriate. It is more verbose though. Maybe "as pages used" instead? Same for comment below
src/vm/collection.rs
Outdated
/// page granularity, the occupied pages (instead of individual objects) determine the memory | ||
/// footprint of a process, and how much memory MMTk spaces can obtain from the OS. | ||
/// | ||
/// However, if the VM is incapable of accouting off-heap memory by pages (for example, if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// However, if the VM is incapable of accouting off-heap memory by pages (for example, if the | |
/// However, if the VM is incapable of accounting off-heap memory in pages (for example, if the |
src/vm/collection.rs
Outdated
/// # Performance note | ||
/// | ||
/// This function will be called when MMTk polls for GC. It happens every time the MMTk | ||
/// allocators allocated a certain amount of memory, usually one or a few blocks. Because this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// allocators allocated a certain amount of memory, usually one or a few blocks. Because this | |
/// allocators have allocated a certain amount of memory, usually one or a few blocks. Because this |
let used_pages = self.get_used_pages(); | ||
let collection_reserve = self.get_collection_reserved_pages(); | ||
let vm_live_bytes = <Self::VM as VMBinding>::VMCollection::vm_live_bytes(); | ||
let vm_live_pages = conversions::bytes_to_pages_up(vm_live_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a one line comment here just saying that the live bytes may not be actual pages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's reasonable because the live bytes may be occupied bytes instead of actual pages, and may sometimes be an approximate value.
@k-sareen I have made changes according to your suggestions. |
src/plan/global.rs
Outdated
/// Get the number of pages that are reserved, including used pages and pages that will | ||
/// be used (e.g. for copying). | ||
/// Get the number of pages that are reserved, including pages used by MMTk spaces, pages that | ||
/// will be used (e.g. for copying), and live pages allocated by the VM as reported by the VM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not live pages allocated by the VM necessarily. Maybe slightly rephrase it to say something like "We also take into account the live bytes as reported by the VM. For more details, refer to the [vm_live_bytes
] function."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased it a bit.
Some VMs allocate memory outside the MMTk heap, using
malloc
or other allocation methods. Those memory can usually be reclaiming by the finalizers of dead object in the MMTk heap. This commit allows the VM to report the amount of such off-heap memory so that MMTk can trigger GC more promptly to reclaim such memory.to-do list:
Related work
The "malloc_counted_size" feature: Introduced in #608, that feature was intended for implementing malloc/free in MMTk's spaces, although the current implementation simply wraps the
malloc
function from libc. This PR, on the other hand, is agnostic of how the VM allocates memory outside the MMTk heap. This PR is useful if the VM already does similar accounting. For example, Ruby accounts for memory allocated byxmalloc
, a wrapper ofmalloc
.